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Abstract 



The stabilization algorithm of Weisfeiler and Leman has as an input any square matrix 
A of order n and returns the minimal cellular (coherent) algebra which includes A. 

In case when A = A(T) is the adjacency matrix of a graph T the algorithm examines 
all configurations in V having three vertices and, according to this information, partitions 
vertices and ordered pairs of vertices into equivalence classes. The resulting construction 
allows to associate to each graph T a matrix algebra W(T) := W (A(T)) which is an invariant 
of the graph T. For many classes of graphs, in particular for most of the molecular graphs, 
the algebra VF(T) coincides with the centralizer algebra of the automorphism group Aut(r). 
In such a case the partition returned by the stabilization algorithm is equal to the partition 
into orbits of Aut(r). 

We give algebraic and combinatorial descriptions of the Weisfeiler-Leman algorithm and 
present an efficient computer implementation of the algorithm written in C. The results ob- 
tained by testing the program on a considerable number of examples of graphs, in particular 
on some chemical molecular graphs, are also included. 

1 Introduction 

The fundamental problem of graph symmetry perception arises in numerous areas of chemistry 
and physics. In this context molecules are modelled by graphs where the vertices represent 
atoms and the edges represent bonds. The aim is to find equivalence classes of "elements" (e.g. 
vertices, edges, pairs of vertices, subgraphs etc.) of the graph, or more rigorously, the orbits of 
the action of the automorphism group Aut(T) of the graph T on the set of all "elements" of a 
considered "mode" (for example the orbits of Aut(T) on the set of vertices). It is clear that in 
order to use such a statement one has to get at some intermediate stage a convenient description 
of Aut(T). 

However, usually chemists avoid the computation of Aut(T), preferring to use certain invariants 
in order to get a classification of "elements" of the graph. 

Let us express the above claim more accurately for the case of vertices. A vertex invariant in a 
graph T is a property or a parameter of vertices which is preserved by any of its automorphisms, 
i.e. the property does not depend on the labelling of the graph. Let be a function which is 
defined on the set of all vertices of T. Then (f> is an invariant of vertices if cf>(x) = <j)(x') whenever 
the vertices x, x' belong to the same orbit of Aut(T). 

Let us now consider a certain invariant (a set of invariants). Then we may define that two 
vertices belong to the same equivalency class if and only if they have the same value of the 
invariant (invariants) being taken into account. Whatever the classification approach, the re- 
sulting partition cannot be finer than the partition into the orbits of the automorphism group 
(the automorphism partition). 

From an algorithmic point of view the main goal in classification of vertices is to find an algorithm 
which ensures to produce the automorphism partition of a graph and which is known to be 
theoretically efficient (this means that the running time is restricted by a polynomial in the size 
of the input). However, all known polynomial time methods for the automorphism partitioning 
problem yield heuristic solutions, i.e. they result in partitions where the equivalence classes are 
orbits or unions of orbits. 
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It is well known (see e.g. [ReaC77] , [Pon94c] ) that the automorphism partitioning problem 
(which is strongly related to the problem of graph symmetrj0) is algorithmically equivalent 
to the graph isomorphism problem (the problem of graph identification) in the following sense. 
Assume that a polynomial algorithm is given solving one of the two problems. Then it is possible 
to construct from this algorithm a second polynomial algorithm which solves the other problem. 
However, both problems seem to be hard from a computational point of view. Therefore heuristic 
approaches are still considered as practically helpful. 

The simplest of these approaches are restricted to a classification of vertices (in chemical terms 
they determine atom equivalence only) and are based on different techniques, often applied 
iteratively, using the valencies of the vertices. The major weakness of these methods is that 
they will not give any partition for regular graphs (graphs where all vertices have the same 
valencies). For that reason it seems quite natural to extend these techniques to a classification 
of vertices and edges (i.e. atoms and pairs of atoms). Now, not only configurations of two 
vertices (which define the valencies) but also configurations consisting of three vertices have 
to be considered. This is exactly the basic idea of the algorithm of Weisfeiler and Leman. It 
partitions all vertices and ordered pairs of vertices of a graph by examining all ordered triples 
of vertices. This approach turns out to be the most powerful method in some class of graph 
symmetry perception algorithms. 

The algebraic object which is constructed in this way and which has been introduced by Weis- 
feiler and Leman in [WciL68j is a cellular algebra. It is an invariant of the underlying graph. 
Under a different point of view (without any relation to the automorphism partitioning prob- 
lem) this object has also been found and called coherent configuration by Higman in [Hig70|. 
In greater detail, Weisfeiler-Leman's approach was described (in English) in [Wei76]. However, 
during approximately twenty years, their ideas were completely unknown and neglected in math- 
ematical chemistry. Nowadays, the approach itself and its interrelations with the identification 
and symmetry perception of graphs are rather familiar to the experts in algebraic combina- 
torics (see e.g. |Hig87| , [Fri89] . |FarIK90j . |Pon93bj . |Pon94bj . |FarKM94j ) . however a careful 
investigation of its abilities still remains a topical problem. 

More or less the same as Weisfeiler-Leman's approach was independently elaborated by G. 
Tinhofer (partly in joint work with his student J. Hinteregger) in [Tin75| and [HinT77j, however 
without explicitly describing the resulting algebraic object. In 1989, Ch. and G. Riicker (a 
chemist and a mathematician) realized the necessity of having a method for the partition of 
atom pairs, and produced a heuristic computer program for that purpose by simple reasoning 
without using any group theoretical machinery [RucR90a] , [RueR90b] , [RueR9l] . 

The main purpose of our paper is not only to draw attention to the algorithm of Weisfeiler 
and Leman but, mainly, to present a good and practical program implementation. To our 
knowledge, no such implementation exists up to now. Only few attempts were made in the 
past. A first version of our program has been described by I.V. Chuvaeva, M. Klin and D.V. 
Pasechnik in [ChuKP92]. By means of a very careful revision we now realized all advantages and 
disadvantages of this implementation. Recently, L. Babel established in [Bab95j the theoretical 
complexity of the Weisfeiler-Leman algorithm and, using these considerations, also created a 
computer program. A detailed description is given in [BabBLT97]. Our common experiences 

1 In this paper we are not interested in finding the automorphism group of a graph (what is commonly meant 
by solving the problem of graph symmetry) but rather in finding the automorphism partition. 
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(including some important suggestions of Ch. Pech) enabled us to modify the version of the 
program by Chuvaeva et al. into a very fast program implementation. A comparison with the 
program of Babel et al. shows that our program, although inferior with respect to theoretical 
complexity, is much more efficient from a practical point of view. 

This paper is organized as follows. In Section 2 we introduce the standard terminology and 
some basic definitions. After that a brief survey on previous approaches to graph stabilization is 
given in Section 3. Section 4 contains the definitions of a cellular algebra and related algebraic 
objects, states some important properties and interpretations and introduces the cellular algebra 
which is associated to a given graph. In Section 5 an algebraic description of the algorithm of 
Weisfeiler and Leman is presented, followed in Section 6 by a more illustrating combinatorial 
interpretation. Thus, the algorithm is exposed from two different points of view, the first using 
matrix notation, the second using graph theoretical notation. Sections 7 and 8 give a description 
of the program implementation and an estimation of its complexity. Furthermore, in a brief 
excursion we present the main ideas of the complexity considerations and of the implementation 
of Babel's algorithm. Finally, in Section 9, extended testing of our program on a large number 
of examples is documented in order to demonstrate its capability. We conclude with a discussion 
in Section 10. 

This work is the second contribution in a series of papers [KliRRT99] , [TinK99] concerning dif- 
ferent aspects of algebraic combinatorics with emphasis on applications in mathematical chem- 
istry. The series introduces the basic concepts of algebraic combinatorics and presents some 
of the main features and tools for perception of symmetry properties of combinatorial objects. 
Those readers who are not familiar with mathematical standard definitions and notations such 
as matrix, group, basis, equivalence class, etc. are referred to the first paper [KliRRT99] in this 
series. However, we tried to make this work as self-contained as possible and hope that it should 
be understandable for readers with a rather limited knowledge of mathematics. 

The present version almost fully coincides with [BabCKP97], see Section [10] for more details. 

2 Preliminaries 

An undirected graph is a pair r = (f2, E) consisting of finite sets Vt and E, the vertices and the 
edges. Each edge connects two different vertices u and v from O and is denoted by {u, v}. This 
means that each element from E is an unordered pair of different vertices from Q. A directed 
graph is a pair A = (£l,R) with vertex set f2 and arc set R, where each arc, denoted by (u,v), 
links two different vertices u and v and additionally is assigned a direction, namely from u to 
v. Each element of R is an ordered pair of different vertices from Q. If a vertex u belongs to an 
edge or an arc then u is said to be incident to the edge or arc. Often it is convenient or useful 
to consider an undirected graph T as a directed graph A with each edge {u, v} replaced by two 
arcs v) and (v,u). 

Usually, a directed or undirected graph is given either by its diagram or by its adjacency matrix. 
A diagram is a drawing on the plane consisting of small circles which represent the vertices 
and lines between pairs of vertices which represent the edges. An arc (u, v) is indicated by an 
arrow starting in vertex u and ending in vertex v. A more abstract representation is the (0,1)- 
adjacency matrix A = (a uv ). In order to make evident that the matrix A represents a graph T, 
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we will also write A(T). The rows and columns of A are indexed by the elements of f2, which 
for sake of simplicity are often numbered by 1,2, ... ,n with n = |0| (or, as for example in the 
computer package COCO, see below, by numbers 0, 1, . . . , n — 1). Thus, A is a n x n— matrix. 
The entry a uv is equal to 1 if the edge {u, v}, respectively the arc (u, v), exists and otherwise. 
Note that the (0,l)-adjacency matrix of an undirected graph is symmetric with respect to the 
main diagonal, whereas in general this is not the case for directed graphs. 

Sometimes it is necessary to deal with (undirected or directed) multigraphs. In a multigraph 
each pair of vertices may be connected by more than one edge or arc. The number of edges resp. 
arcs between two vertices is called the multiplicity of the edge resp. arc. In the diagram multiple 
edges are drawn as parallel lines, multiple arcs as parallel arrows, in the adjacency matrix the 
entry a uv denotes the multiplicity of the edge or arc connecting u and v. 

The complete directed graph is the graph with n vertices where all n(n — 1) arcs are present. 

For certain purposes it is more convenient to consider graphs with loops. A loop is an arc 
connecting a vertex with itself. In this sense, a complete directed graph with loops has n 2 
arcs. In particular, there is one additional arc (u, u) for each vertex u. The main advantage is 
that vertices can be identified with the corresponding loops, which considerably simplifies our 
notation. 

The most general notion of a graph is the colored graph. 

In a colored complete directed graph A all vertices and all arcs are assigned colors in such 
a way that the colors of the vertices are different from the colors of the arcs. Assume that 
{0, 1, . . . , s — 1} are the colors of the vertices and let VLj denote the vertices which are assigned 
color j. Then Q = 0,q U Q\ U . . . U f2 s _i is a partition of the vertex set of A. Similarly, if 
{s, s + 1, . . . , r — 1} are the colors of the arcs and Rk denotes the arcs of color k then R = 
R s U R s +i U . . . U R r -± is a partition of the arc set of A. Each colored complete directed graph 
can be represented by its adjacency matrix A = (a uv ) which contains in the uth row and vih 
column the color of the arc (u,v), that means a uv = k if and only if (u, v) £ Rk- The entry in 
the uth row and uth column is the color of the vertex u, thus a uu = j if and only if u € 

Obviously, any undirected or directed graph can be considered as a colored complete directed 
graph with three colors. The vertices are assigned color 0, the edges (arcs) and nonedges 
(nonarcs) are assigned colors 1 and 2. In the case of a multigraph, different colors of arcs 
correspond to different multiplicities. In this sense, any chemical structure can be seen as a 
colored complete graph. The colors a uu can be interpreted as modes of vertices, for example 
names of atoms in a molecular graph, the colors a uv reflect the multiplicity of bonds or symbolize 
certain chains of atoms. Figure 1 shows a chemical structure and the adjacency matrix A of 
the corresponding colored complete directed graph (interesting properties of this compound are 
discussed in [DunB95 ) is given below; here 0, 1, 2, 3 stands for the atoms of C, N, O, H 
respectively, 4 denotes usual bond, 5 double bond, all other pairs of atoms are denoted by 6, 
upper superscripts denote the labels of atoms. 
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A permutation f acting on a finite set is a bijective mapping from 17 onto itself. For each 
permutation / we denote by v = the image v of an element !i£fl. Let S n be the symmetric 
group of degree n, i.e. the group of all permutations acting on the set 17 with n = |17|. Each 
subgroup G of S n is called a permutation group of degree n. The notation (G, 17) indicates that 
the permutation group G acts on the set 17. 

An automorphism of a colored complete directed graph A = (17, R) is a permutation 5 on 
which preserves the colors of the vertices and arcs, i.e. which fulfills u € 17j -£4> u 9 G 17j and 
(u, u) € -£4* (u 9 ,v g ) £ Rk for all «,DeI! and all colors j, k. It is easy to realize that the set 
of all automorphisms of a graph A forms a group. This group is called the automorphism group 
of A and is denoted by Aut(A). Clearly, G = Aut(A) is a permutation group acting on 17. 

Let (G, 17) be a permutation group acting on 17. We define a binary relation « on 17 in such 
a way that u & v holds for two elements u,v G 17 if and only if there exists a permutation 
<? G G with v = u 3 . Since (G, 17) is a group, the relation « is an equivalence relation on 17, its 
equivalence classes are called orbits (or 1-orbits) of (G, 17). The set l-orb(G,Q) of the orbits of 
(G, 17) forms a partition of the set 17. 

The automorphism partition of the vertex set 17 of a colored complete directed graph A is the 
set l-orb(Aut(A),Q) of the orbits of its automorphism group. Obviously, if two vertices u and 
v belong to the same 1-orbit, then there is an automorphism g which maps u onto v. The 
automorphism partitioning problem is the problem of finding the automorphism partition of a 
graph. 

To give an example, the bijective mapping g on 17 = {1, 2, . . . , 19} defined by 

(1)(2, 6)(3, 5)(4)(7, 8)(9)(10, 12)(11, 13)(14, 15)(16)(17, 18)(19) 

is an automorphism of the colored complete directed graph A which belongs to the structure in 
Figure 1. The automorphism partition of A is 

{{1}, {2, 6}, {3, 5}, {4}, {7, 8}, {9}, {10, 11, 12, 13}, {14, 15}, {16}, {17, 18}, {19}} . 

A graph A is called vertex-transitive if for any two vertices u and v there exists at least one 
automorphism such that v = u 9 . Obviously, if a graph is vertex-transitive then its automorphism 
partition is trivial, meaning that there is exactly one orbit containing all vertices from 17. 

Commonly, a permutation / on 17 is represented by a so called permutation matrix M(f) = 
(m uv ). This n x n— matrix has entries and 1 with m uv = 1 if and only if v = . It is easy to 
see that a permutation matrix has exactly one entry equal to 1 in every row and in every column, 
all other entries are 0. In fact, this property is a necessary and sufficient condition for a matrix 
to be a permutation matrix. Now the property of a permutation to be an automorphism of a 
graph can be reformulated in terms of matrices. Namely, a permutation matrix M determines 
an automorphism g of A if and only if M commutes with the adjacency matrix A of A. This 
means that the equality 

M -A = A-M 

holds. For example, let V be the undirected graph depicted in Figure 2. 
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Figure 2 
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is the adjacency matrix of the corresponding colored complete directed graph A. The permuta- 
tion g of J) = {1, 2, 3, 4} defined by l 9 = 3, 2 9 = 4, 3 9 = 1 and 4» = 2 is an automorphism of A 
with associated permutation matrix 
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Given a permutation group (G, SI), let us now consider the set of permutation matrices M{G) = 
{M(g) \ g £ G}. A graph A is called invariant with respect to the permutation group (G, SI) 
if and only if its adjacency matrix commutes with all permutation matrices from M(G). Let 
us further consider the set V(G, S2) of all n x n-matrices B which commute with matrices from 
M(G), i.e. 

V(G, SI) = {B I M(g) ■ B = B ■ M(g) for all g £ G}. 

V(G, ft) is called the centralizer algebra of the permutation group (G, fl) (the notation V(G, ft) 
stems from the German word "Vertauschungsring" , the use of which goes back to I. Schur and 
H. Wielandt). It is easy to realize that the set of nonnegative integer matrices B from V(G, SI) 
coincides with the set of adjacency matrices of multigraphs which are invariant with respect to 
(G,Sl). A centralizer algebra V(G, SI) is known to have the following main properties: 

(i) V(G, SI) can be considered as a linear space with basis Aq, A±, . . . , A r -±, 
where each A{ is a (0, l)-matrix. 

(ii) Si=o Ai = J, where J is the matrix with all entries equal to 1. 

(iii) For every matrix Ai there exists a matrix Aj with A\ = Aj , 
where A\ denotes the transposed matrix of Ai. 

To each permutation group (G, SI) we can associate a new induced permutation group (G, SI 2 ), 
where for / € G and (u, v) € S2 2 we define 
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Let 2-orb(G,£l) be the set of orbits of the induced action of G on Q 2 . This partition of the 
set of all ordered pairs of elements of f2 is called the partition into 2-orbits of (G, il). Each 
member Ri G 2-orb(G, fi) of this partition defines a graph Tj = (Q,Ri) with the adjacency 
matrix Ai = A (Ti). It turns out that the matrices Aq,Ai,... ,^4 r -i mentioned in (i) coincide 
with the latter adjacency matrices. 

More precisely, the basis matrices Aq, A\, . . . , A r -\ of the centralizer algebra V(G, fl) correspond 
to the 2— orbits of the permutation group (G, £1) in the following manner. The (u,v)— entry of 
the basis matrix A{ is equal to 1 if and only if (u, v) belongs to the i-th 2— orbit. In this sense, 
the set of all 2— orbits can be represented very conveniently by a single matrix of the form 
A = J2i=o i ' Ai- This means that two pairs (u, v) and (u' , v') belong to the same 2— orbit if and 
only if the corresponding entries in the matrix A are equal. 

Let us consider as an example the set 

G = {(1)(2)(3)(4)(5), (1, 2, 3), (1, 3, 2), (1, 2) (4, 5), (1, 3)(4, 5), (2, 3)(4, 5)} 

of permutations acting on the set £1 = {1,2,3,4,5}. It can be easily proved that (G, Q) is a 
permutation group (cf. 4.16 in [KliRRT99]). Now consider the corresponding set of permutation 
matrices 

/ 1 \ / 1 \ / 1 \ 



M(G) = { 
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It turns out (cf. 5.4 in [KliRRT99]) that the centralizer algebra V(G, O) has the basis 
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thus we obtain 
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To our knowledge, B.Yu. Weisfeiler and A. A. Leman were the first to consider the more general 
problem (in comparison with the automorphism partition) of finding the set of 2— orbits of 
(Aut(A),Q) for a given graph A. In this setting the automorphism partition of a graph is a 
byproduct of the determination of the 2— orbits: the 1— orbit of a vertex u is simply the 2— orbit 
of the pair (u, u) (for brevity, we will sometimes speak about the orbits of a graph, of a vertex, 
etc., instead of the orbits of the automorphism group of the graph). 

In the following it will be our goal to determine the 2— orbits of a given graph or, equivalently, 
to find the 2— orbit matrix A. 

3 Stabilization Procedures 

First attempts to attack the automorphism partitioning problem date back approximately thirty 
years. All these approaches try to find the 1— orbits of a given graph. Usually, the classical paper 
|Mor65| by H.L. Morgan is considered to be the first procedure for graph stabilization. Given an 
undirected graph T = (Q,E), the idea is to start with a partition of the vertex set according 
to the valencies. The valency of a vertex u is the number of edges which are incident to u. 
Two vertices u and v are put into the same class of the partition if and only if they have equal 
valencies. Then this partition is refined iteratively using the extended valencies. The extended 
valency of u is defined as the sum of the previous extended valencies of all neighbours of u, i.e. 
of all vertices which are connected with u by an edge. Again two vertices are put into the same 
class of the partition if and only if they have equal extended valencies. This iteration procedure 
terminates as soon as the stable partition is reached, that is the next partition coincides with 
the previous one. 

Years later it has been recognized that Morgan's approach is just a special case of stabilization of 
depth 2. This technique works as follows. Assume that we have a partition f2o, ■ ■ ■ , ^ s -i of 
the vertex set Q, of V according to the valencies. Assume further that this partition is numbered 
such that Q,q contains the vertices of smallest valency, Q,\ the vertices of second smallest valency, 
etc. Now for each vertex u € Q we compute a list of length s which contains the valencies of u 
with respect to each class 0,j (that means the number of edges connecting u with vertices from 
%)j J = 0, 1, . . . , s — 1. Each class may now be partitioned into subclasses, each consisting 
of vertices with equal lists. In this way we may eventually obtain a refinement of the original 
partition. If this is the case, then the subclasses are numbered according to the lexicographical 
ordering of the corresponding lists. Then we restart the same proceeding with the refined 
partition. If in each class the lists of the vertices are identical, then no further refinement is 
obtained and the procedure stops. 

The resulting partition of ft is commonly called the total degree partition of the graph T (see 
e.g. [Tin86j ) . It is the coarsest partition £1$, Oi, . . . , £l s —l of with the property that every two 
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vertices belonging to the same cell Qj have the same valencies with respect to any other cell Q^, 
k = j included (the coarsest equitable partition of V in the sense of |God93| ) . 

The total degree partition cannot be finer than the automorphism partition, since, obviously, a 
necessary condition for two vertices u and v to belong to the same 1— orbit of Aut(T) is that 
they belong to the same class of the total degree partition. In fact, every class of the total 
degree partition is a union of 1— orbits. Figure 3 shows a graph where the total degree partition 
consists of one class only, namely SI, but which is not vertex-transitive, i.e. the automorphism 
partition consists of more than one 1— orbit. This example makes evident the weakness of the 
above kind of stabilization. The method will not give any partition for regular graphs (graphs 
where all vertices have the same valencies), not even in the case when Aut(T) is the trivial group 
consisting of the identity only (what means that each 1— orbit consists of a single vertex). 

The reader will recognize in Figure 3 "cuneane", cf. 4.26 and 5.8 in [KliRRT99j. 




5 4 
Figure 3 

As already mentioned in the introduction, the problem of the recognition of graph symmetry can 
be made more precise in mathematical language as an automorphism partitioning problem; the 
problem of graph identification corresponds to an isomorphism problem. Let us give a precise 
mathematical statement of the latter problem. 

An isomorphism from a graph T = ($7, E) to a graph V = (f2', E') is a bijective mapping h 
from Q to Q' such that (u,v) £ E if and only if (u h ,v h ) 6 E' . If such a mapping exists then 
r and r' are called isomorphic. Obviously, if T = T' then an isomorphism coincides with an 
automorphism. The isomorphism problem is the problem of deciding whether two graphs are 
isomorphic or not. Similarly as for the automorphism problem there is also a matrix formulation 
for the isomorphism problem. Namely, two graphs T and V with adjacency matrices A resp. A' 
are isomorphic if and only if there exists a permutation matrix M with 

M ■ A = A' ■ M. 

If this equality is multiplied from the left with the inverse M _1 of M then, using the fact that 
M = M t holds for each permutation matrix M, we obtain 

A = M* • A' ■ M. 
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This modified equality can be interpreted in the following way. The matrix A is obtainable 
from the matrix A' by permuting simultaneously rows and columns. This corresponds just to a 
renumbering of the vertices. 

G. Tinhofer has found a very interesting algebraic characterization of total degree partitions. He 
first relaxed the notion of an isomorphism between two graphs T and r' using doubly stochastic 
matrices. A matrix X is doubly stochastic if the entries of X are nonnegative and the sum of 
the entries in each row and in each column is equal to 1. Note that every permutation matrix 
is doubly stochastic. Now let again A and A' be the adjacency matrices of V resp. V . Then the 
two graphs are called doubly stochastic isomorphic if and only if there exists a doubly stochastic 
matrix X fulfilling the equality 

X ■ A = A' ■ X. 

Of course, two isomorphic graphs are also doubly stochastic isomorphic, however, the converse 
direction is not true in general. For example, the graphs T and V of Figure 4 are doubly 
stochastic isomorphic (choose X = 1/6 • J), but they are not isomorphic. 



2 3 2 3 



F 




6 5 6 



Figure 4 

Tinhofer proved in [Tin86| that two graphs are doubly stochastic isomorphic if and only if 
they have identical total degree partitions. To be more precise, the total degree partitions 
no,fii,...,n„_i of r = (n,E) and fi' ,O / 1 ,...,fi / s ,_ 1 of r' = (n',E r ) are identical, if s = s', 
= and, for each pair of vertices u £ £lj and u' £ Q'j, the valency of u with respect to 
f2fc is equal to the valency of u' with respect to Q' k , j, k £ {0, 1, . . . , s — 1}. 

The shortcoming of the total degree partition, as pointed out above, motivated the construction 
of more powerful algorithms for graph stabilization. A well known approach is to apply the 
refinement technique, which has already been used for the computation of the total degree 
partition, by replacing the valencies of the vertices by other invariants of the graph. For example, 
for each vertex u we may count the number of cycles of a given length which contain u, the 
number and sizes of cliques containing u, etc. However, finding cycles or cliques in a graph 
is an extremely difficult task which, in general, requires time exponential in the size of the 
graph. Therefore, we should use a criterion which is easy to check. A reasonable approach 
is to consider not only configurations consisting of two vertices, i.e. the edges and nonedges 
(which define the valencies), but to examine also configurations consisting of three vertices, i.e. 
all triples of vertices. This procedure is commonly called stabilization of depth 3. The algorithm 
of Weisfeiler and Leman, which will be formulated and discussed in great detail in Sections 5 
and 6, is based on that principle. 
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We have seen above that stabilization of depth 2 can be associated with a certain combinatorial 
object, namely with the total degree partition of the graph under consideration. This imme- 
diately implies the question whether there is a similar object which belongs to stabilization of 
depth 3. It turns out that such an object really exists. Moreover, this object has not only a 
combinatorial but also a very interesting algebraic description. The details are exposed in the 
next section. 

4 Cellular Algebras 

A matrix algebra of degree n is a set of n x n— matrices which is closed under matrix addition, 
matrix multiplication and multiplication of a matrix by a scalar, i.e. if X and Y belong to the 
matrix algebra and z is any real number, then also X + Y, X ■ Y and z ■ X belong to the matrix 
algebra. 

A cellular (or coherent) algebra is a matrix algebra which additionally is closed under Schur- 
Hadamard (=componentwise) multiplication of matrices and under matrix transposition, and 
which contains the identity matrix I and the matrix J all entries of which are equal to 1. We 
will denote the Schur-Hadamard product of two matrices X and Y by X oY. Thus, if X = (x uv ) 
and Y = (y uv ) then X o Y = (x uv ■ y uv ). Each cellular algebra W has a basis Aq, A\, . . . , A r _\ 
(basis of the vector space W) consisting of (0, l)-matrices which is called standard basis of W, 
r is the rank of W. It is not hard to see that a standard basis Aq,Ai, . . . ,A r -i, if suitably 
numbered, satisfies the following properties: 

(i) E^o 1 M = J 

(ii) J2f=o Ai = I for some q with q < r 

(iii) Ai o Aj = o i + j, i, j € {0, 1, . . . , r — 1} 

(iv) for each % G {0, 1, . . . , r — 1} there is a j £ {0, 1, . . . , r — 1} such that A\ = Aj 

(v) for each pair i, j G {0, 1, . . . , r — 1} we have 

AiAj = p° j Ao+p} j A 1 + ... +p\- 1 A r „ l . 

A cellular algebra W with standard basis Aq,Ai, . . . , A r _i can be represented in a very con- 
venient way using the matrix A(W) = E[=o * ' -^*> called the adjacency matrix of the cellular 
algebra W. This matrix is unique up to the numbering of the basis matrices. 

The nonnegative integers are called the structure constants of W. These numbers have a 
very nice geometric interpretation. We consider the colored complete directed graph A = (17, R) 
which belongs to the adjacency matrix of W. For our purposes, it is very convenient to identify 
each vertex u in A with the corresponding loop (u,u), i.e. we deal with the complete directed 
graph with loops. Then the matrices A^ correspond in a natural way to the arc sets Rk of color 
k (the first q matrices of the standard basis virtually represent the vertices of A; the induced 
partition of the vertex set is called the standard partitioix] of the graph). An arc (n, v) has color 

1 More rigorously, we should speak about standard partition of depth 3 in order to emphasize that the partition 
is induced by stabilization of depth 3. In the following, we briefly call it the standard partition. 
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k if and only if the matrix Ak has entry 1 in the uth row and irth column. The entry (u, v) in 
the matrix Ai ■ Aj is the number of directed paths of length 2 from vertex u to vertex v, such 
that the first step is an arc of color i and the second step is an arc of color j. The decomposition 
(v) implies that for any arc (it, v) of a fixed color k the number of paths of length 2 from u to 
v, such that the first step is of color i and the second step is of color j, is the same and equal to 
pfj (see Figure 5). In fact, if Aq,A\, . . . ,A r _\ fulfill (i)-(iv), then this condition is sufficient for 
these matrices to be the standard basis of a cellular algebra. 




Figure 5 

It is not hard to see (see [KliRRT99j) that each centralizer algebra is also a cellular algebra 
(therefore, the matrix A given at the end of Section 2 represents a cellular algebra of rank 6 
with standard basis Aq, Ai, . . . , A5). Let W be a cellular algebra. If a group with the centralizer 
algebra W exists then the cellular algebra W is called Schurian, after I. Schur, who was the first 
to investigate cellular algebras (actually he was using a different terminology of group rings, see 
[KliRRT99] for details). The importance of Schurian cellular algebras stems from the fact that, 
as already indicated in Section 2, its basis matrices correspond to the 2— orbits. Moreover, the 
diagonal matrices of the basis correspond to the 1— orbits. 

Given any n x n— matrix X, the cellular algebra W(X) generated by X is defined to be the 
smallest cellular algebra which contains X. It is important to know that this definition is 
rigorous. That means the resulting algebra is well defined and unique (for a detailed explanation 
see [KliRRT99]). As a consequence, we are able to associate with each graph T a matrix algebra, 
namely the cellular algebra W(.A) which is generated by the adjacency matrix A of T. We will 
also write W(T) in order to indicate that the cellular algebra corresponds to the graph T. 

There are a number of graph classes whose associated cellular algebras are Schurian (for example 
graphs with a simple spectrum [Pon94a| ). For those graphs the automorphism partition can be 
immediately deduced from the cellular algebra. Namely, in this case the automorphism partition 
coincides with the standard partition. Unfortunately, this is not the case in general. The 
simplest counterexamples can be found among strongly regular graphs. An undirected graph 
is called strongly regular (see [H esH7lj ) if the standard basis of the associated cellular algebra 
consists only of 3 matrices. The first such examples of non-Schurian cellular algebras of rank 
3 were found by H. W ielandt jgjgg H, L.C. Chang |Cha59| . S.S. Shrikhande |Shr59j and G.M. 
Adel'son-Velskii et al. [AdeWLF69] . All these graphs have rather large automorphism groups. 
Later on, collaborators of Weisfeiler found an example with the identity automorphism group 
(see [Wei76] ). 

Nevertheless, although we cannot guarantee that we get the 1— orbits and 2— orbits for each 
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graph, extended practical experience indicates that the results obtained by the cellular alge- 
bras are sufficient, in particular for practically all chemical graphs. This is confirmed by the 
computational results which are presented in Section 9. 

The representation of a cellular algebra as a colored complete directed graph and the interpre- 
tation of the structure constants p^- shows that in a cellular algebra implicitly all configurations 
of a graph consisting of three vertices are considered. In other words, the cellular algebra is the 
algebraic object which is associated to stabilization of depth 3. 

At the end of this section, let us give a precise statement of the problem which now has to be 
solved. Given a graph T, we actually deal with two closely related problems. 

Problem 1 : Compute the basis Aq, Ax,..., A— l of the cellular algebra W(T) (or in other words 
the colored complete graph A which is associated to the cellular algebra W(T)). 

Problem 2 : Construct the colored complete graph A with the structure constants p^. 

In the next section we give an algebraic description of the algorithm of Weisfeiler-Leman which 
settles Problem 1. In Section 6 we present a very illustrative graph theoretical interpretation 
which solves Problem 2. 

5 Algebraic Description of the Algorithm 

B. Weisfeiler and A. Leman were the first to show that the cellular algebra of a graph can be 
computed in polynomial time. The proposed method, firstly described in Russian in the paper 
j\\cil.(iS . has as input any matrix A (the adjacency matrix of a graph T) and as output a basis 
of the cellular algebra W(.A) generated by A. The initial description of the algorithm was too 
sophisticated, clearer ones are given in the English written book |Wei76] . and also in [Fri89j and 
Hig87|. We will first explain the main features of the algorithm by combining all these ideas, 
illustrate it by an example, and then present a formal description. 

The construction of the cellular algebra H^(^4) proceeds iteratively. We start with the adjacency 
matrix A = {a uv ) of an undirected or directed graph T (the diagonal entries are set to be different 
from the nondiagonal entries). This matrix can be written in the form A = J^i=o * ' ^* where 
Ai are (0, 1)— matrices with (u, v)— entry equal to 1 if and only if a uv = i. At the end of each 
iteration we obtain a new set of (0, 1)— matrices A' ,A[, . . . ,A! r ,_x which fulfills the properties 
(i)-(iv) of a cellular algebra as stated in Section 4, but which may fail property (v). In particular, 
this means that A' , A'x, ■ ■ ■ , A' ,_ is the basis of a linear subspace S which is closed under Schur- 
Hadamard multiplication and transposition and which contains the identity matrix / and the all 
1 matrix J. Note that this also holds at the beginning of the procedure in case A is symmetric 
and the values of the diagonal entries are different from all other entries. 

Initially, the linear subspace S with basis matrices Aq, Ax, ■ ■ ■ ,A r -x w ih n °t fulfill property (v) 
of a cellular algebra, i.e. S will not be closed with respect to matrix multiplication. It is easy to 
see that S contains all products of matrices from S if and only if it contains all products Ai ■ Aj 
of basis matrices. Therefore, we consider in each iteration the linear subspace which is generated 
by all these products AqAq,AqAx, . . . , A r -xA r -x and compute a basis of (0, 1)— matrices for it. 
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This process is repeated until it is stable, this means the basis of the actual iteration coincides 
with the basis of the previous iteration (up to the numbering of the basis matrices). In that case 
the subspace S is closed under matrix multiplication, property (v) is satisfied and, consequently, 
S is a cellular algebra. 

A straightforward method to construct the basis A' , A[, . . . , A' r ,_ x of a linear subspace S which 
is generated by some set of matrices {Bq, B\, ... , -Bp-i} (in our case this is just the set {j4oA)i 
AqA\, . . . ,A T -\A r -iY) and which is closed under Schur-Hadamard multiplication is described 
in the paper |Hig87|. Let B = Bq. If B = then set Bi = B,i + \ for i = 0, 1, . . . ,p — 1 and repeat 
the procedure. Assume that B = (b uv ) ^ and let t be a nonzero entry of B. Set D = (d uv ) to 
be the (0, 1)— matrix such that d uv = 1 if and only if b uv = t. Let B = D o B\. If 1? ^ then let 
D be a (0, 1)— matrix as constructed above. Now let B = D o B<i and repeat the procedure. If 
B = then let B = D o B% and repeat the procedure. The last (0, 1)— matrix D will be the first 
element A' in the basis of S. Let B^ = B{ — Bi oA'^ i = 0, 1, . . . ,p— 1 and repeat the procedure. 
In this way we obtain a set A' , A[, . . . , A' r ,_ l of (0, 1)— matrices fulfilling property (iii). 

This procedure can be formulated in a more compact and convenient way (which in fact was 
the original way used by Weisfeiler-Leman) by introducing indeterminates ij, i = 0, 1, . . . , r — 1, 
which can be considered to represent the different entries of an adjacency matrix. In this sense, 
the matrix D = X)i=o M-^-i represents the adjacency matrix. Since matrix multiplication is a 
noncommutative operation, it is important in the following to assume that the indeterminants 
are noncommuting with respect to multiplication, i.e. t{tj ^ tjti. Now compute the product 
B = D ■ D = Y7iZo Sj=o titjAiAj. Each entry of this matrix is a sum of products titj. In order 
to obtain the basis of S we replace equal entries in B by new indeterminates t^. Now it is not 
hard to verify that the (0, 1)— matrices A' , A' x , . . . , A' r ,_ 1 of the resulting matrix A' = J^I^q 1 t'i-^'i 
are exactly the basis matrices of S. 

The following very simple example will illustrate this procedure. Consider the graph T which is 
depicted in Figure 6. 

H 4 H 5 




Figure 6 

Here superscripts denote the numbers from O, = {1, 2, 3, 4, 5, 6} associated to atoms which form 
the molecule of ethylene. Let A be the adjacency matrix of the colored graph T associated to the 
molecular graph depicted in Figure 6 (here stands for the carbon atom, 1 for hydrogen atom, 2 
for usual bond, 3 for double bond and 4 means that there is no bond between the corresponding 
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atoms), 
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Now we proceed with the matrix A', 

/023344X 
2 4 4 3 3 
, _ 5 6 1 7 8 8 

5 6 7 1 8 8 ' 

6 5 8 8 1 7 

V 6 5 8 8 7 1 / 

We suggest that the reader repeats the process with the matrix A' instead of A and checks that 
the matrix A" resulting from the second iteration coincides with A'. This means that A' is in 
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fact the desired result of the stabilization, namely, the adjacency matrix A(W(r)). The reader 
can easily find that the order of the automorphism group G = Aut(r) of our graph V is equal 
to 8 and the matrix A' represents the set of 2-orbits of (G, Q). Hence in this case W(r) is really 
a Schurian cellular algebra. 

Here is a formal description of the algorithm (which will be denoted in the following by the 
initials of the authors). 

Algorithm WL 

Input: the adjacency matrix A = A(T) = (a uv ) of colored graph T. 

Output: a standard basis Aq,Ai, . . . ,A r -i of the cellular algebra W(T), or more exactly the 
adjacency matrix A(W(F)). 

(0) Let {0, 1, . . . , s — 1} be the set of different entries of A. 
For k = 0, 1, . . . , s — 1 do 

Define A k = (a{k) uv ) to be the matrix with 
a(k)uv = 1 if a uv = k and a(k) uv = otherwise. 
Let r := s. 

(1) Let D = Y^t k A k , 

where to, t\, . . . , i r _i are distinct indeterminates, 
which are noncommuting with respect to multiplication. 

(2) Compute the matrix product B = (b uv ) = D ■ D. 

Each entry b uv of B is a sum of products t rfj . 

(3) Determine the set {do, d±, . . . , d s _i} 

of different expressions among the entries b uv . 

(4) If s > r then 

For k = 0, 1, . . . ,s — 1 do 

Define A k = {a(k) uv ) to be the matrix with 

a{k) uv = 1 if b uv = dj. and a(k) uv = otherwise. 
r := s. Goto (1). 

(5) STOP. 

As already indicated in the previous section, it follows from the representation of a cellular 
algebra by a colored complete directed graph that algorithm WL implicitly considers all config- 
urations consisting of three vertices in the graph. There are other methods (see e.g. the references 
in |RueR9 0b] ) for perception of graph symmetry which also use configurations of three vertices. 
The main advantage of algorithm WL over them is the fact that the cellular algebra contains 
exhaustive information which is obtainable from subgraphs of at most three vertices. Thus, the 
resulting partitions of the vertices and pairs of vertices are the finest among the partitions which 
can be deduced using these configurations. 

An interesting method which is rather close to algorithm WL is described in |RueR90b] . It is 
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based on raising the adjacency matrix to its higher powers, evaluating the entries and partitioning 
the atoms and pairs of atoms into equivalence classes. Indeed, this procedure considers subgraphs 
consisting of three vertices. However, some information can get lost when the resulting partitions 
are coarser than those obtained by algorithm WL. In particular, an "orientation" of edges cannot 
occur. Note that this is possible even if the algorithm WL is applied to an undirected graph. 
Consider for example the graph T in Figure 7. 




/ 2 2 2 \ 

3 14 4 

3 4 14 

\ 3 4 4 1 J 



Figure 7 



The edge {1, 2}, for instance, can be considered to be oriented (in the sense that its endvertices 
are situated differently according to the whole graph) . Algorithm WL applied to this graph yields 
not only a coloration, but also an orientation of the edges, i.e. in the colored complete directed 
graph, which represents the cellular algebra, the arcs (1,2) and (2, 1) have different colors (the 
colored complete directed graph is given above by its adjacency matrix). The cellular algebra 
of r is of rank 5. The standard basis consists of the following matrices: 
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Note that for the same graph the algorithm which was described in [RueR90b] will get as the 
output the symmetric matrix 
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2 
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V 2 



2 \ 
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3 

1 / 



hence two antisymmetric basic matrices of W^(r) will be merged. 



6 Graph Theoretical Interpretation 

In the previous sections, we already indicated that each cellular algebra W with basis Aq, A±, . . . , 
A r -i can be represented by a colored complete directed graph A = (Q,R), the graph whose 
adjacency matrix is the matrix ^4(VF) = ^4(A) = Y7iZo (with indeterminates t, standing 
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for the colors i). This matrix sometimes is called generic matrix of W. For convenience, the 
vertices in A have been identified with the corresponding loops. In this sense, the basis matrix 
Ak corresponds to the arc set Rk of color k, and R = {i?o, Ri, ■ ■ ■ , R r -\}- The number of 
colors in A is equal to the rank of W . More generally, to each linear subspace S with linear 
basis Aq, Ai, . . . , A r _i satisfying the properties (i)-(iv) of a cellular algebra, we can associate 
a colored complete directed graph A = (Q,R), the graph which belongs to the generic matrix 

With this representation, the main idea of the algorithm described above can be sketched in 
a more illustrative manner. In each iteration of the algorithm, the coloring of the underlying 
complete directed graph is modified by means of Schur-Hadamard multiplication and matrix 
multiplication. These two operations may be interpreted as follows. Given generic matrices 
X = (x uv ) and Y = (y U v) of two colored complete directed graphs A' and A" with indeterminates 
representing the colors, the Schur-Hadamard product X o Y = (x uv y uv ) corresponds to the 
generic matrix of a new colored complete directed graph A where the color of arc (u, v) is the 
ordered mixture of the colors of both arcs in the original graphs. In the case of the matrix 
product X ■ Y, the color of the arc (u, v) in the new graph A depends on the number and colors 
of paths of length 2 starting in u and ending in v such that the first step in the path is an arc 
of A' and the second step is an arc of A". The (u, i>) — entry ^2i W x uw y wv of X • Y completely 
describes the set of these paths. 

In the following we present a slightly different procedure leading to our main goal, namely an 
efficient computer program for graph stabilization. The method is based on the computation of 
the structure constants pfj which have been defined and interpreted in Section 4. To recall the 
main result, a basis Aq,Ai,..., A r _i of a cellular algebra W must fulfill 

AiAj = p%A + p 1 ij A l + ...+ v r ^K-\ 

for each pair i,j € {0, 1, ... ,r — 1}. In the colored graph A, this means that each arc (u,v) 
of a given color k is the basis arc of exactly pfj triangles with first nonbasis arc of color i and 
second nonbasis arc of color j (a triangle consists of three not necessarily distinct vertices u, v, w 
and arcs (u,v), (u,w) and (w,v). The arc (u,v) is called the basis arc, the other arcs are the 
nonbasis arcs of the triangle; see Figure 5). 

The idea of the algorithm can be described informally as follows. One iteration includes the 
round along all arcs of the given graph A. For each arc (u,v) of a fixed color k we count 
the number of triangles with basis arc (u,v) and nonbasis arcs of color i and j, respectively, 
i, j = 0, 1, . . . , r — 1 (equivalently, we count the number of paths of length 2 such that the first 
arc (u,w) is of color % and the second arc (w,v) is of color j). These numbers should be equal 
for all arcs. If this is true, then these numbers are just the structure constants p\y If not, then 
the arc set Rk of color k has to be partitioned into subsets R^, R^, • • • , Rk t -n each consisting 
of arcs with the same numbers. This step is performed for all colors k G {0, 1, . . . , r — 1}. Then 
the graph A is recolored, i.e. we identify color ko with the old color k and introduce the new 
colors ki, . . . , kt-i (in algebraic language, recoloring A means to replace the basis matrix Ak by 
new basis matrices A^^A^, . . . ,Aj tt _ 1 ). 

The next iteration is performed for the recolored graph A. If in some iteration no new colors are 
introduced, then the process is stable and we can stop. In this case, the graph A with the final 
stable coloring represents the required cellular algebra W. Here is a more formal description of 
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the algorithm. 



Algorithm STABIL 

Input: the adjacency matrix A(T) = (a uv ) of a colored graph T. 

Output: a complete directed graph A = (fi, R) with a stable coloring 

and the structure constants pjL-. 

£ j 

(0) Let {0, 1, . . . , s — 1} be the set of different entries of A(T) and 

A = (fi, i?) the colored complete directed graph belonging to A(F). 
Determine the arc sets Rq, R\, . . . , R s -\ of colors 0, 1, . . . , s — 1. 
Let r := s. 

(1) For jfe = 0,1,... ,r - 1 do 

For all (u, v) € do 

Compute the numbers pfj of triangles with basis arc (u, v) and 
nonbasis arcs of colors i and j, respectively, i,j € {0, 1, . . . , r — 1}. 

Collect all arcs having the same parameters p^-, i.e. all arcs 
which belong to the same number of triangles of any colors, 
and assign them to new sets Rk , Rk x , ■ ■ ■ , Rk t -i ■ 

Replace Rk by R ko , R^ , • • • , Rk t -i > 

i.e. recolor the arcs of color k using the old color ko = k 
and the new colors k±, . . . , kt-i- 

(2) Let s be the number of colors used to recolor A. 

If s > r then 

r := s. Goto (1). 

(3) STOP. 

Let us illustrate this method by a small example, namely we consider again cuneane, see Figure 
3. 

Here 

/ 1 2 3 3 3 3 2 

2 1 2 2 3 3 3 

3 2 1 2 3 3 3 
3221233 
3332122 
3 3 3 3 2 1 2 
2 3 3 3 2 2 1 

\2323323 



2 
3 
2 
3 
3 
2 
3 
1 
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is the adjacency matrix of the corresponding colored graph; 



B = 



/12344322\ 

2 1 5 5 4 6 4 3 
35154462 
4 5 5 1 2 4 4 4 
4 4 4 2 1 5 5 4 

3 6 4 4 5 1 5 2 
24645513 

\23244231y 



is the result after the first iteration, 



12344325\ 
6 7 8 9 10 11 12 13 

13 8 7 9 10 12 11 6 

14 15 15 16 17 18 18 14 
14 18 18 17 16 15 15 14 
13 11 12 10 9 7 8 6 
6 12 11 10 9 8 7 13 
53244231, 



C = 



V 



is the result which we get after the second iteration (it in fact coincides with the final result). 

7 Program Implementation 

The presented algorithm STABIL has been coded in C and was tested on a SUN-Sparcstation. 
The program requires as an input a file containing the number of colors, the vertex number n 
and the adjacency matrix A(T) of an arbitrary graph T, and provides as an output the number 
of colors (i.e. the rank), the number of cells in the standard partition, the adjacency matrix of 
the cellular algebra VF(r), and (if requested) the structure constants of H^(r). 

In the following we will give some information about the implementation. The adjacency matrix 
of the graph T is stored in a n x n— matrix. After each iteration of the program, this matrix 
will contain the adjacency matrix of the actual colored complete graph A. In the final state it 
contains the adjacency matrix of W(T). 

During any iteration of the program, except the last one, the number of colors increases and 
some arcs of A are recolored. Arising of new colors implies a new iteration, while absence of 
new colors during some iteration gives the sign for finishing the program. In the latter case a 
new iteration will not change the coloring of A. 

Any iteration includes the following two imbedded loops: the loop around the colors and the 
loop around the arcs of a fixed color. To handle the loop around all arcs of a fixed color, we 
introduce an additional data structure representing the graph A, namely lists which store the 
arcs (u, v) of a given color. Each element of a list contains the entries u and v , i.e. the number 
of the row and column of the arc in the adjacency matrix, and a pointer to the information 
concerning the next arc of the given color. 
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The following actions with one arc (u, v) of color k form an elementary step of the program, 
except the last one: 



the end of a vector MEMORY. 

(iii) Assigning the color to the arc (u, v). This is done by examining whether the last sequence 
in MEMORY is a new sequence (then a new color is introduced), or the same sequence already 
appears in MEMORY for some other arc of the actual color k. 

Let us consider these actions in some more detail. 

Computation of the structure constants. As mentioned above, each arc is characterized by a 
set of r 2 numbers, called structure constants. The geometrical meaning of these numbers has 
been described, too. In order to calculate the numbers p\j for a given arc (u,v), we examine 
all triangles with basis arc (u,v) (see Figure 5). Note that there are n such triangles. If for 
some vertex w the arc (u,w) has color i and the arc (w,v) has color j, then we increase pfj by 
1 (initially all p\j are set equal to 0). Thus the number of nonzero p\j does not exceed n. 

If we compute simultaneously all r 2 structure constants for an arc and store them in a straight- 
forward way using an r x r— matrix, then we will soon get storage overflow, since the existence 
of a matrix with r 2 elements in the program is impossible already for comparatively small n. 
However, since there are at most n nonzero structure constants for each arc, we can use instead 
of a matrix CONST with r 2 entries a data structure whose size is proportional to n. This data 
structure consists of lists whose elements contain the information i, pfj and a pointer to the 
next nonzero structure constant in the column j of the matrix CONST. Additionally we need 
a vector with r elements to save the pointers to the columns (the described technique, called 
hashing, is a well known tool to treat efficiently set manipulation problems; see [AhoHU74] ) . 

Saving of the structure constants for arcs of a given color. In view of the forthcoming manipu- 
lations, it is more adequate to save the structure constants as a sequence of triples i,j,pfj- The 
maximal number of such triples in the sequence of a given arc is n. During the pass along the 
arcs of a given color k, we have to save all sequences of structure constants belonging to the arcs 
of this color. Therefore the length of the vector MEMORY should be about 3n 3 (note that the 
number of arcs of color k is restricted by n 2 ). 

The search along MEMORY and recoloring of an arc. The nonzero structure constants for a 
given arc (u, v ) of color k are saved at the end of the vector MEMORY. Then we should examine 
whether we have a new sequence or whether an identical sequence has been saved in MEMORY 
before. We use the partial ordering of the sequences by their lengths. So, if we search for an 
identical sequence, in the case of equal lengths we come to an element by element comparison, 
in the case of nonequal lengths we pass to the next sequence. 

As mentioned before, the main problem in the program is to find memory for the structure 
constants. For graphs with a comparatively large number of vertices, it is impossible to save 
all these constants for a given color k. Therefore we decided to save only the nonzero numbers 

1 We stress the reader's attention that the term "structure constants" has a rigorous meaning only after the 
fulfillment of the program. Currently the structure constants are just the numbers of triangles with the prescribed 
properties. 




& for (u,v) as the sequence of triples (i,j,ph) at 



>mce 



i,j € {0, l,...,r - 1}, i.e. A is 
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as triples i,j,Pij in the vector MEMORY. Now, some graphs may initially produce a very large 
number of different sequences i,j,ph and the corresponding massive storage requirements may 
exceed the available memory capacity of small computers. Therefore we start the program with 
a simple preprocessing procedure which aims to increase in advance the number of colors, before 
starting the main program. 

In the first step of this preprocessing we classify the vertices of the (colored) graph T in the 
following way. Two vertices are put into the same cell if and only if they are incident to the same 
number of edges of each color. Then we can recolor the edges according to the new coloring 
of the vertices. If two edges, which initially have the same color, connect two different colored 
pairs of vertices, then the edges are assigned different colors, too. 

Some additional practically important details about the current version of the program imple- 
mentation of algorithm STABIL may be found in Section 9. 



8 Estimation of the Complexity 

The question of the theoretical complexity of the WL-stabilization was not considered for a long 
time. Weisfeiler and Leman only stated that the complexity is polynomial in the vertex number 
n of the graph, without giving any explicit time bound. A first attempt for an estimation was 
done by S. Friedland. In [Fri89] he pointed out that the required time is restricted by 0(n 10 ). 
I.N. Ponomarenko |Pon93a| improved the time bound to 0(n 5 log n). Very recently, L. Babel 
showed in |Bab95] that the algorithm can be implemented to run in time 0(n 3 logn). 

8.1. Before we are going to determine the worst case complexity of our implementation, let 
us once more stress that we are dealing with two different problems, depending on the point 
of view. The first problem is to find the standard basis of the cellular algebra W(r) belonging 
to some graph T. The second problem, which many times appears in framework of algebraic 
combinatorics, is to compute the structure constants p^ and a stable coloring of the complete 
directed graph A representing the cellular algebra W(r). Our implementation solves the second 
problem whereas the implementation of the above-mentioned algorithm by Babel solves the first 
problem. 

Let us now analyze the complexity of algorithm STABIL. We only have to examine step (1), the 
initializing step (0) and step (2) obviously can be performed in time 0(n 2 ). We have seen in the 
previous section that an elementary step of (1) consists of three actions which are performed for 
each of the n 2 arcs of the graph. 

In action (i) the n triangles with basis arc (u, v) have to be found. This can be done in time 
0(n) by inspecting the uth row and the vth column of the adjacency matrix A = (a uv ) of the 
graph. Note that for each vertex w there is one triangle with basis arc (u,v), the entries a uw 
and a wv of A are the colors of the nonbasis arcs (u, w) and (w, v). For each triangle the value of 
some parameter pfj is actualized. To be more precise, let a uv = k. Then the value of pfj must 
be increased by 1 if a uw = i and a wv = j. In order to find the actual value of p\j we have to pass 
through the corresponding list in the data structure (containing the information i,pfj)- Since the 

1 Readers who are not familiar with complexity considerations of algorithms are referred to the standard book 
|AhoHU74] , 
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length of this list is at most n, this requires time 0(n) for each triangle. Thus, action (i) requires 
total time 0(n 2 ) for one arc. Since there are at most n nonzero structure constants for each arc, 
action (ii), namely saving the sequence of structure constants in the vector MEMORY, can be 
executed in time 0{n). The most time consuming part is action (iii). In order to compare the 
sequence of triples i,j,pfj for one arc (u, v) with all such sequences already stored in MEMORY, 
we eventually have to pass through the whole vector MEMORY. Since this vector may be of 
length 3n 3 , this requires time 0(n 3 ). (We stress that the current implementation does not use 
any storage/search technique, see also 8.5). 

This shows that the complexity of one elementary step is 0(n 3 ). Since n 2 arcs are treated, 
one iteration of step (1) requires time 0(n 5 ). Now it remains to give a bound on the number 
of iterations. If only one color is added during each iteration, then there are n 2 iterations. 
Altogether, this results in a worst case time bound of 0(n 7 ). 

Remarks 

(i) . Similar reasonings were done less carefully in [ChuKP92] and thus resulted in the evaluation 

0(n 8 ). 

(ii) . In fact, the number of iterations in the WL-stabilization is less than n 2 ( |Ade95| ). however 

such an opportunity to improve the evaluation will not be used in this paper. 

(iii) . In the worst case, if M^(r) coincides with the full matrix algebra of order n, there are ra 2 

basis matrices and therefore n 6 structure constants (most of them are zero). This infor- 
mation may help the reader to realize the difference between the statements of Problems 
1 and 2. 

The crucial point in the implementation concerning both running time and space requirement 
is the vector MEMORY. In order to make the program applicable also for relatively small 
computers, it is favourable to restrict the length of MEMORY to 0(n 2 )0. With this modification, 
it may be impossible to store all the different sequences of structure constants. In that case, new 
sequences are not saved and all the corresponding arcs are assigned the same color. This color 
will be split during the next iteration of the program (note that this procedure may increase the 
number of iterations which are needed to obtain the stable coloring). 

8.2. As mentioned above, the implementation of the algorithm presented in [Bab95| has a 
considerably lower worst case complexity. We will very briefly indicate the main ideas of that 
implementation. Compared to algorithm STABIL, there are two main modifications, one de- 
creases the number of triangles which are examined in one iteration, the other involves some 
sophisticated sorting techniques. 

In each iteration of algorithm STABIL, all n 3 triangles of the colored graph A are examined. 
However, it is not really necessary to inspect the whole set of triangles. One can restrict to 
a certain subset. Roughly sketched, the procedure is the following. Let a colored complete 
directed graph A be given. In each iteration some arcs of the graph will keep their colors, others 
are assigned new colors (which have not been used in the previous iteration). More concrete, 
the arc set Rk is split into subsets Rk , Rk±-, ■ ■ ■ , -Rfc t _i, where one of the colors kg, ki, . . . , kt-i is 

1 In the actual version of the program, we defined MEMORY to be of length 3n 2 , see also additional remarks 
in next section 
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equal to k and the others are new colors. The basic idea is to inspect only those triangles which 
contain at least one arc of a new color. Denote by Tm the set of these triangles. Further let Rm 
denote the set of arcs which are basis arcs of triangles from Tm- Now, step (1) of the algorithm 
STABIL is modified in the following way. Each arc (u, v) G Rm is the basis arc of some (in 
general less than n) triangles from Tm- For (u,v) G Rm list the colors of the nonbasis arcs of 
these triangles. Now, each arc from Rm is associated a multiset of some ordered pairs 
Collect arcs with equal multisets and assign them the same color, i.e. replace each arc set Rk 
by suitable subsets Rk , R^, ■ ■ ■ , Rk t -i- One of these subsets, say Rk (which may be empty), 
consists of all arcs from Rk which do not belong to Rm- The procedure stops if no new colors 
are generated. 

It is not yet designated which one of the colors ko, ki, . . . , kt-i is equal to the old color k and 
which ones are new colors. The effort for each iteration is kept low if Tm contains only a small 
number of triangles. Therefore it is favourable to identify k with that color k p where Rj- contains 
the largest number of arcs. It is not hard to check the correctness of this method (for details 
see [Bab95] ). The worst case complexity is determined as follows. 

Let Th denote the cardinality of Tm in the hth iteration. Then obviously \Rm\ < T h- Therefore, 
multisets of at most Th arcs (u, v) have to be computed. These multisets are stored as lists 
S(u, v) and are obtained as follows. Order the Th triangles from Tm lexicographically according 
to the colors of the nonbasis arcs (a pair appears before (i',f) if and only if i < il 
or % = i' and j < f). Note that the colors are in the range {0, 1, . . . , n 2 — 1} (since the graph 
has n 2 arcs, not more than n 2 colors can occur). It is well known that lexicographical ordering 
of Th pairs of integers from {0, 1, . . . , n 2 — 1} can be done in time 0{jh + n) using the sorting 
routine bucket sort. Now, the lists S(u,v) are obtained by passing through the ordered list of 
triangles and assigning the actual triangle to its basis arc (u,v), i.e. the colors of the nonbasis 
arcs are inserted at the end of S(u,v). Obviously, this requires time 0(jh)- Note that the pairs 
of colors in the lists S(u, v) appear now in lexicographical order. 

To identify the different multisets we have to order the lists S(u,v), (u,v) G Rm, lexico- 
graphically. Since the total length of all lists is and the entries are pairs of numbers from 
{0, 1, . . . , n 2 — 1}, this again can be done with bucket sort in time 0{jh + n). Now the arc sets 
Rk are split in the obvious way by passing through the ordered list of multisets S(u,v). This, 
as well as finding the subsets Rk v of largest cardinality, requires time 0(t/i). 

So far we have seen that the complexity for the hih iteration of this method is 0{jh + n). It 
remains to compute the total complexity for all iterations (trivially, the number of iterations is 
restricted by the maximal number n 2 of colors). 

Since Rk p has been chosen to be that subset of Rk with largest cardinality, each of the other 
subsets (which contain the arcs with new colors) has at most half the size of Rk- Therefore, each 
time a certain triangle is inspected, at least one arc set which shares an arc with this triangle 
is at most half as large than before. As a consequence, each of the n 3 triangles is examined not 
more than 3 log n 2 = 6 log n times. This shows that J2h T h < 6 n 3 log n. Finally we obtain a 
worst case time complexity of (9(n 3 logn). 

8.3. These ideas have been realized in a computer program by L. Babel, S. Baumann and M. 
Liidecke. The program is termed STABCOL, due to the fact that the coloring of the complete 
directed graph is modified in each iteration until the process is stable, i.e. until a STABle 
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COLoring is obtained. It is coded in programming language C and was also tested on the same 
SUN-Sparcstation. Just as for the program STABIL, the input is a file containing the number 
of colors, the vertex number and the adjacency matrix of a graph T, the output contains the 
number of colors, the number of cells and the adjacency matrix of the cellular algebra which is 
generated by T. 

Contrary to STABIL, the program STABCOL does not work with vectors of predefined lengths 
but uses more sophisticated data structures. The set Tm of triangles and the multisets S(u, v) 
are stored in lists which are linked by pointers and which are of variable length. In this way, 
waste of memory space is avoided. Furthermore, memory space which is no longer needed is set 
free immediately. 

The complexity analysis of STABIL shows that most of its time is spent in order to compare 
a new sequence of numbers with old sequences. Since this is done in the obvious way by 
passing through the entire vector of sequences, it requires time proportional to the length of the 
vector. This somewhat time-consuming procedure is avoided in STABCOL by means of very 
special sorting techniques. These techniques and the more complicated data structures make 
the implementation much more ambitious. However, we do not have enough space to go into 
details here. The interested reader may consult the program description |BabBLT97j . 

8.4. At first glance, a comparison of the theoretical complexities indicates that the implemen- 
tation of [Bab95j should be preferred. However, it turns out that this implementation, although 
theoretically very fast, is rather slow in practice and applicable only for relatively small graphs, 
whereas our implementation, although inferior with respect to the worst case bound, is very fast 
in practice and is able to handle very large graphs (the practical behaviour of both program 
implementations is documented in the next section). 

Here we are confronted with a situation which seems to be strange but which rather frequently 
occurs on the construction of algorithms. There are two algorithms or two implementations of 
an algorithm solving the same problem, one of them theoretically fast (i.e. with a good worst 
case complexity) but practically slow, the other one practically fast in spite of a relatively bad 
worst case complexity. 

There are two main reasons for this paradox. First, and perhaps most important is that the 
sign "O" in the evaluation means in fact the existence of some constant as a multiplier with 
the monomial depending on n. The actual value of this constant depends on many factors, in 
particular on the "complexity" of the data structures. In our case the multiplier for the imple- 
mentation of STABIL is essentially smaller than the one for STABCOL. Thus the advantages 
of the theoretically faster algorithm cannot be felt on comparably small graphs. Second, one 
algorithm has been constructed from a purely theoretical point of view with the aim to obtain 
a worst case complexity as good as possible. No practical considerations are taken into account 
such as simplicity of data structures, easy way of implementing, small space requirements, etc. 
In particular, the running time of the algorithm in the mean (the mean taken over a large repre- 
sentative selection of practically relevant examples) is not considered. This average behaviour, 
however, is much more important for practitioners than the worst case behaviour, which often 
occurs only for pathological examples. The second algorithm is constructed from that practical 
point of view. It aims to solve the "real world" problems very fast, without paying attention to 
its theoretical complexity. 
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8.5. It is worthwhile to stress that the worst case time bound 0(n ) (see 8.1) is a very rough 
upper bound. 

In fact we see a number of opportunities to diminish this bound essentially. One of them was 
mentioned in Remark (ii) in Subsection 8.1. 

Also (as a result of restricting the length of vector MEMORY to 0(n 2 )) the complexity of one 
elementary step in the current version of the program is actually reduced from 0(n 3 ) to 0(n 2 ). 
In spite of the fact that the total number of steps may slightly grow, here we have one more 
standby to reduce the upper bound. 

A more careful analysis of the applied technique of hashing together with a more clever organi- 
zation of storage (via the use, e.g., of balanced binary trees) may essentially decrease the number 
of comparisons when we operate with the vector MEMORY. 

However we do not use these and other possible options in the current preliminary version of 
our report. In contrast to [BabBLT97], our report is oriented towards those practical users of 
STABIL, for whom the theoretical question of the evaluation of the efficiency does not play a 
crucial role. 

Nevertheless, we intend to return to the consideration of this question in the future. 

9 Testing the Program 

The presented implementation STABIL has been tested on a large number of structures. All 
computations were done on a SUN-Sparcstation 10 with 128MB RAM. On this machine, the 
program is able to handle graphs with up to 2000 vertices. To demonstrate the capability of 
the program we considered acyclic compounds and compounds containing multiple bonds or 
heteroatoms as depicted in Figure 8. These structures also appeared as illustrations in papers 
by other authors (see e.g. [RanBW80], [RueR90b] ). The results are summarized in Table I. 
Besides the running time of the program, the number of cells in the standard partition and 
the number of colors in the stable coloring (i.e. the number of equivalence classes of atoms and 
ordered pairs of atoms) are stated. In order to make evident the practical efficiency of our 
program, we also state the running time of the program implementation STABCOL. Note that 
STABCOL requires space proportional to n 3 , therefore it can handle graphs with not more than 
150 vertices. 
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Table I. Results for the structures in Figure 8 



structure 




number 


number 


CPU time [seconds] 


(graph) 


n 


of cells 


of colors 


STABIL 


STABCOL 


1 


12 


2 


16 


0.04 


0.05 


2 


8 


3 


18 


0.03 


0.01 


3 


12 


7 


66 


0.05 


0.07 


4 


18 


6 


86 


0.08 


0.27 


5 


12 


12 


144 


0.05 


0.08 


6 


20 


1 


6 


0.06 


0.19 


7 


12 


3 


27 


0.04 


0.07 


8 


18 


8 


102 


0.07 


0.25 


9 


17 


17 


289 


0.06 


0.21 


10 


20 


3 


30 


0.07 


0.32 


11 


18 


8 


83 


0.07 


0.21 


12 


20 


10 


119 


0.09 


0.30 


13 


28 


27 


730 


0.20 


1.00 


14 


10 


5 


34 


0.04 


0.03 


15 


11 


3 


22 


0.04 


0.05 


16 


22 


8 


146 


0.13 


0.43 


17 


20 


6 


76 


0.08 


0.31 


18 


16 


10 


114 


0.06 


0.16 


19 


8 


2 


8 


0.03 


0.01 


20 


22 


16 


292 


0.14 


0.45 


21 


25 


5 


31 


0.08 


0.34 


22 


8 


8 


64 


0.03 


0.01 
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The program has also been tested on first members of three infinite families of graphs where the 
automorphism groups and the numbers of orbits on the vertices and ordered pairs of vertices 
are known. We give a description of these families. The results are shown in Tables II-IV. 

Benzene stacks. We denote by Pj~ the graph from this family consisting of n = Qk vertices. 
The vertices of Pk form k stages (strata), each stage (stratum) is a cycle of 6 vertices. Besides 
the edges in these cycles there are edges between stages. The graphs Pk, k = 2, 3, 4, are depicted 
in Figure 9. A formal description of the graphs Pk is the following. 

Let L = {a, b, c, d, e, /}, K = {1, 2, ... , k}, X{ = (x, i) for x € L, i G K. 

Then P k = (n(P k ),E(P k )), where n(P k ) = LxK, E(P k ) = U*=i Ri U \JjZl Qj, and 

Ri = {{ai, bi}, {bi,a}, {ci,di}, {di, ej, {e», ft}, {/», a;}}, 

r> =1 i{ a i' a i+il' i c i' c i+ii' i e i' e j+i}} : J = 2/ - 1 
^ \ {{b j ,b j+1 },{d j ,d j+1 },{f j J j+1 }} : J = 2/. 

It is known from [KliLP89] and from [KliLPZ92] that the automorphism group of Pk is isomor- 
phic to S3 x S2- Aut(Pk) has k orbits on the set fl(Pk) and Ak 2 orbits on the set fi(Pfc) x Q(Pk)- 



Table II. Results for benzene stacks 





number 


number 


CPU time [seconds] 


n 


of cells 


of colors 


STABIL 


STABCOL 


6 


1 


4 


0.03 


0.01 


12 


2 


16 


0.03 


0.08 


18 


3 


36 


0.08 


0.32 


24 


4 


64 


0.15 


0.88 


30 


5 


100 


0.28 


2.01 


36 


6 


144 


0.44 


4.83 


42 


7 


196 


0.79 


10.60 


48 


8 


256 


1.14 


14.90 


54 


9 


324 


1.71 


21.87 


60 


10 


400 


2.48 


24.73 


66 


11 


484 


3.39 


32.77 


72 


12 


576 


4.60 


58.35 


78 


13 


676 


6.64 


75.92 


102 


17 


1156 


16.21 


177.82 


126 


21 


1764 


35.45 


362.34 


150 


25 


2500 


65.74 




174 


29 


3364 


117.89 




198 


33 


4356 


190.83 
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Figure 9 

Mobius ladders. We denote by M\, = (Q(Mfc), E(Mk)) the graph with the set of n = 2k vertices 
f2(Mfc) = {ai, . . . , afe, afc+i, . . . , a2fc} and the set of edges E(M] S ) = a?} \ j — i = x (mod 
2k), x £ {l,k,2k — 1}}. For example, the graph M5 is depicted in Figure 10, which may serve 
as an explanation of the name. 

The symmetry of the graphs has been investigated in |KliKZ9(j] . |Sim86j . jWalSH88], 
|FarKM94j and |KliRRT99] . It was proved in [KliKZ90j that, for k > 3, the automorphism 
group of Mk is isomorphic to the dihedral group D2k- This group has one orbit on the set 
Q(M k ) and k + 1 orbits on the set £l(M k ) x 0(M fc ). 
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Table III. Results for Mdbius ladders 





1 

number 


1 

number 


CPU time [seconds] 


n 


of cells 


of colors 


STABIL 


STABCOL 


6 


1 


3 


0.03 


0.01 


12 


1 


7 


0.05 


0.08 


18 


1 


10 


0.07 


0.28 


24 


1 


13 


0.11 


0.80 


30 


1 


16 


0.21 


1.68 


36 


1 


19 


0.35 


3.99 


42 


1 


22 


0.57 


7.62 


48 


1 


25 


0.87 


12.34 


54 


1 


28 


1.27 


17.74 


60 


1 


31 


1.76 


24.34 


66 


1 


34 


Z.ot) 


OZ.D4 


I I 




6 1 


3.21 


46.22 


80 




41 


4.57 


66.16 


100 




51 


9.57 


133.30 


120 




61 


17.33 


232.40 


140 
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Figure 10 



Dynkin graphs. Let D n denote the tree with n vertices as depicted in Figure 11. For n > 4 
the automorphism group of D n is isomorphic to Z 2 . It has n — 1 orbits on the set of vertices 
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and n 2 — 2n + 2 orbits on the set of ordered pairs of vertices. 

Table IV. Results for Dynkin graphs 



number number CPU time [seconds] 



n 


of cells 


of colors 


STABIL 


STABCOL 


6 


5 


26 


0.03 


0.01 


12 


11 


122 


0.09 


0.10 


18 


17 


290 


0.14 


0.36 


24 


23 


530 


0.25 


1.05 


30 


29 


842 


0.43 


2.28 


36 


35 


1226 


0.79 
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Figure 11 

The program STABIL has a rather long history. The first attempt of an implementation was 
done by E.V. Krukovskaya in PASCAL, see [KliK90] . A draft version of the present program was 
written in C by I.V. Chuvaeva and D.V. Pasechnik at the N. D. Zelinskii Institute of Organic 
Chemistry (Moscow) in 1990-1992, see [ChuKP92]. Finally this version was improved at the 
Technical University Munich in 1995. The improved version had static memory and, by this 
reason, was available only to graphs with up to 200 vertices. This version was carefully tested and 
the results of this testing are presented above. In 1996 new improvements were done according 
to the suggestions of Ch. Pech (Dresden): dynamical memory management was created. Now 
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the current version, in principle, can handle graphs with an arbitrary number of vertices. If for a 
given graph T the number of vertices is sufficiently small (that is if there will be enough memory 
for saving all data structures) then we will get W(r) as output. Otherwise, the program will 
inform the user that the task cannot be fulfilled completely. 

This last version of the program was used for other purely theoretical purposes. Our experience 
shows that graphs with up to 500 vertices can be successfully managed, however in some cases 
we were able to handle even larger graphs. 

The codes of both programs STABIL and STABCOL and a read. me file which describes how to 
use the programs are released under GPLv3 and can be downloaded from the homepage of the 
fourth author: http://www.ntu.edu.sg/home/dima/software.htm, 

The users of the programs are requested to reference this report whenever results obtained with 
help of STABIL or STABCOL are published. 

10 Discussion 

The presented algorithm provides a very powerful and efficient tool to determine equivalence of 
atoms and pairs of atoms in molecules. The equivalence classes are obtained by examining in 
a systematic way all configurations of three vertices in the underlying graph. The partitions of 
the vertices and edges are the finest which can be deduced using configurations of this size. 

As already mentioned before, the standard partition of a graph not necessarily coincides with 
its automorphism partition. Indeed, there exist graphs where the former partition is coarser 
than the latter. Methods which settle this shortcoming to a certain extent are based on the 
following idea. Classify vertices and edges by examining configurations which consist not only 
of three but of a larger number of vertices. This proceeding is generally called deep stabilization 
or stabilization of depth t. 

Roughly speaking, the situation is the following. Let T be a graph with n vertices (possibly a 
directed multigraph) and t a fixed integer, 2 < t < n. All possible n l subgraphs of V which are 
induced by the ordered i-tuples of vertices are examined. We have to find all \x isomorphism 
types of these subgraphs. To each pair (u, v) of vertices a vector of length /x is associated, 
each component of the vector being equal to the number of subgraphs of the corresponding 
isomorphism type which contain the pair (u,v). Any iteration of the stabilization procedure of 
depth t assigns two pairs (u, v) and (u' , v') the same color if and only if the vectors corresponding 
to these pairs are equal. It is clear that the computation of the total degree partition is nothing 
else than stabilization of depth 2, Weisfeiler-Leman stabilization has depth 3. For depth at least 
4 we obtain stronger algorithms, however at the price of a considerably higher complexity. One 
of the first attempts of a program implementation of stabilization of depth t > 4 for purely 
chemical goals and on a rather "naive" level was done in |RueR91]. 

It is demonstrated in the next paper [TinK99] of this series that, in contrast to first expectations, 
stabilization of depth t with some t > 4 is also not sufficient to rigorously settle the automorphism 
partitioning problem. It turns out (see [Fur87], [CaiFI92]) that for any fixed value of t there 
exist graphs with the property that the standard partition of depth t does not coincide with the 
automorphism partition. 
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In future work we intend to develop an implementation of the WL-stabilization which eventually 
is even faster than the implementation STABIL presented in this paper. A very promising 
approach is to perform in an alternative way stabilization steps of depth 2 and 3. Given a 
colored complete directed graph A = (Q,R), we start with stabilization of depth 2, i.e. we 
compute the total degree partition of A (let A& be the graph consisting of the arcs of color k; 
then the total degree partition of A is the coarsest partition of V such that any two vertices 
belonging to the same cell of the partition have the same valencies with respect to any other cell 
in any graph A/%). The arcs of a given color are recolored according to the colors of their end 
vertices such that arcs between different colored pairs of vertices are assigned different colors. 
In the next step the coloring of the arcs and vertices is refined analogously as in STABIL by 
considering all triangles of the graph. However, in order to decrease the effort, this is not done 
iteratively, but only once. After that we again compute the total degree partition, perform one 
stabilization step of depth 3, etc. The algorithm stops if the coloring of A is stable. 

As it was mentioned before, we can only be sure to get the automorphism partition of a graph 
r by means of WL-stabilization if it is known in advance that the algebra W(T) is Schurian. 

In general, we can only suggest to proceed in the following way: 
find the automorphism group G = Aut(T) of the graph T; 

describe the set of 2-orbits (or only 1 -orbits) of the action of G on the vertex set of T. 

This problem, in principle, may be solved using e.g. the computer package COCO (LA. Faradzev, 
M.H. Klin), the UNIX implementation by A.E. Brouwer. 

The preliminary versions of COCO are described in [FarK9l] and [FarKM94]. With the use 
of COCO one may handle graphs with a few thousands of vertices. The current version of 
COCO [COCO| is oriented towards purely mathematical goals, namely for the investigation of 
graphs having a sufficiently large prescribed subgroup of the automorphism group. In such a 
case the input graph is described by a set of arcs as a union of suitable 2-orbits of a prescribed 
permutation group. For purely chemical purposes such a mode of input is certainly inconvenient. 
Hopefully, in the future a more suitable interface for chemists will be created. 

This technical report is considered by the authors as a preliminary version of a future regular 
publication. We will be very grateful to everybody who supplies us with remarks, comments, 
criticism or improvements. 

Putting the original version of [ BabCKP97| onto arXiV, we just updated few references, while 
not trying to reflect the ongoing progress during the last decade. The reader may benefit from 
recent papers |EvdPT001 I Coh KM08 . EvdP09], where some new relevant ideas and methods are 
discussed. The authors still hope to produce a throughout revision of the present text. 

Unless more convenient, M. Klin should be regarded as the corresponding author. 
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